Out-of-Vocabulary Word Modeling and Rejection for Spanish Keyword Spotting Systems
نویسندگان
چکیده
This paper presents a combination of out-of-vocabulary (OOV) word modeling and rejection techniques in an attempt to accept utterances embedding a keyword and reject utterances with nonkeywords. The goal of this research is to develop a robust, task-independent Spanish keyword spotter and to develop a method for optimizing confidence thresholds for a particular context. To model OOV words, we employed both word and sub-word units as fillers, combined with n-gram language models. We also introduce a methodology for optimizing confidence thresholds to control the tradeoffs between acceptance, confirmation, and rejection of utterances. Our experiments are based on a Mexican Spanish auto-attendant system using the SpeechWorks recognizer release 6.5 Second Edition, in which we achieved a reduction in error of 8.9% as compared to the baseline system. Most of the error reduction is attributed to better keyword detection in utterances that contain both keywords and OOV words.
منابع مشابه
A Word-spotting Hypothesis Testing for Accepting/Rejecting Continuous Speech Recognition Output
The word rejection problem in speech recognition is formulated in a framework of word-spotting, where a spotted word is verified through a binary, acceptance/rejection decision. A generalized word posterior probability (GWPP), used as the sole confidence measure, is computed in a word graph, via the forward-backward algorithm or in an N-best list, using string likelihoods. The GWPP is further e...
متن کاملLexical Access-based Confidence Measure for a Spanish Keyword Spotting System
Keyword spotting deals with the search of a reduced set of keywords in audio content. Phone Lattice-based approaches are very fast but achieve poor results. HMM-based keyword spotting systems deal with filler models to absorb the Out-of-vocabulary (OOV) words and achieve best results although they are slower. We propose a technique which combines them in order to perform a confidence measure to...
متن کاملPerformance Improvement in Keyword Spotting for Telephony Services
In this paper, a new hybrid approach is presented for keyword spotting. The proposed Method is based on Hidden Markov Mode (HMM) and is performed in two stages. In the first stage by using phoneme models, a series of candidate keyword(s) is recognized. In the second stage, word models are used to decide on acceptance or rejection of each candidate keyword. Two different methods are presented in...
متن کاملA Study on Out-of-vocabulary Word Modeling for a Segment-based Keyword Spotting System
The purpose of a word spotting system is to detect a certain set of keywords in continuous speech. The most common approach consists of models of the keywords augmented with \ ller," or \garbage" models, that are trained to account for non-keyword speech and background noise. Another approach is to use a large vocabulary continuous speech recognition system (LVCSR) to produce the most likely hy...
متن کاملA comparison of grapheme and phoneme-based units for Spanish spoken term detection
The ever-increasing volume of audio data available online through the world wide web means that automatic methods for indexing and search are becoming essential. Hidden Markov model (HMM) keyword spotting and lattice search techniques are the two most common approaches used by such systems. In keyword spotting, models or templates are defined for each search term prior to accessing the speech a...
متن کامل